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“[The Analytical Engine] might act upon other things besides number...supposing, 
for instance, that the fundamental relations of pitched sounds in the science of 
harmony and of musical composition were susceptible of such expression and 
adaptations, the engine might compose elaborate and scientific pieces of music of 
any degree of complexity or extent.” 

— Ada Lovelace, 1842 

“So who am I exactly? With known caveats, I am what people of the past have 
called ‘artificial intelligence.’” 

— Victor Pelevin, iPhuck to, 2017 


Writing in 1842, Ada Lovelace imagines that in the future, Babbage’s Analytical En¬ 
gine (general, purpose programmable computer) will be able to create complex 
music. In the 2017 novel by famous Russian writer Victor Pelevin, set in the late 
21st century, the narrator is an algorithm solving crimes and writing novels about 
them. Today we exist somewhere between these two visions of cultural Al (Artificial 
Intelligence). Algorithms are frequently used to write music, but they don’t really 
“understand” the human world and human meanings. Whether the latter will ever 
happen is unclear. 



Frieder Nake. Hommage a Paul Klee, 13/9/65 Nr.2. The screenprint created from a 
plotter drawing produced by a computer program written by Nake. 1965 































































































































Harold Cohen. Amsterdam Suite C. Lithograph from a drawing generated by a 
computer program written by Cohen. 1977 

The original vision of Al was about automation of cognition. Today, Al also plays a 
crucial role in culture, increasingly influencing our choices, behaviors, and imagi¬ 
nations. For example, it is used to recommend photos, videos, music, and other 
media. Al is also used to suggest people we should follow on social networks, to 
automatically beautify selfies and edit user photos to fit the norms of “good" pho¬ 
tography, and to generate and control characters in computer games. 

While algorithms have been employed in artistic creation by artists since the 
1960s, today industrial scale “cultural Al” is built into devices and services used 
by billions of people. Instead of being an instrument of a single artistic 










imagination.AI has become a mechanism for influencing the imaginations of bil¬ 
lions. Gathered and aggregated data about the cultural behaviors of multitudes is 
used to model our “aesthetic self," predicting our future aesthetic decisions and 
tastes — and potentially guiding us towards choices preferred by the majority. 

The integration of Al into the everyday cultural lives of billions of people raises 
important questions about the future of culture, aesthetics, and taste. In this short 
book, I discuss some of these questions. 



I. 


Al and Production of Culture 

THE MEANING OF “ARTIFICIAL INTELLIGENCE” 

In the original vision of Al in the 1950S-60S, the goal was to teach a computer to 
perform a range of cognitive tasks. According to this projection, a computer would 
simulate many operations of a single human mind. They included playing chess, 
solv-ing mathematical problems, understanding written and spoken language, and 
recognizing the content of images. Sixty years later, Al has become a key instru¬ 
ment of modern economies, deployed to make them more efficient, secure, and 
predictable by automatically analyzing medical images, making decisions on con¬ 
sumer loans, filtering job applications, detecting fraud, and so on. Al is also seen 
as an enhancer of our everyday lives, saving us time and effort. A good example of 
this is the use of voice interface instead of typing. 

But what exactly is “Artificial lntelligence”today? Besides original tasks that defined 
Al such as playing chess, recognizing objects in a photo, or translating between 
languages, computers today perform endless “intelligent" operations. For example, 
your smartphone’s keyboard gradually adapts to your typing style. Your phone may 
also monitor your usage of apps and adjust their work in the background to save 
battery. Your map app automatically calculates the fastest route, taking into ac¬ 
count traffic conditions. There are thousands of intelligent, but not very glam¬ 
orous, operations at work in phones, computers, web servers, and other parts of 
the IT universe. 

Therefore, in one sense, Al is now everywhere. While some Al roles attract our 
attention — such as Google's Smart Reply function that suggests automated email 
replies (used for io% of all answers in Google’s Inbox app in 2017) — many others 
operate in the gray everyday of digital society. 

Why are some intelligent tasks that computers can accomplish seen as “real” Al, 
and others are not? Observers and historians of the Al field talk about “Al effect.” It 


means that “when we know how a machine does something ‘intelligent/ it ceases 
to be regarded as intelligent” {Promise ofAl Not So Bright, 2006). 

That is, after the Al field solves a problem and the solution is implemented in the 
industry, it is no longer seen as part of the field. Paradoxically, we tend to only see 
the challenging and not-yet-solved problems as belonging to Artificial Intelligence 
— and this creates the impression that Al research has not been successful 
throughout its long history. 

The dramatic increase in computer capacities, the ubiquity of digital devices and 
networks, and the challenges and opportunities brought by the “big data” trend of 
the 2000s have also affected Al. We moved from automation of a single mind to 
a kind of “super-cognition.” Think, for example, of search engines such as Baidu, 
Yandex, Bing, and Google that continuously scan the web and index billions of 
websites and blogs. When you enter a search query, a search engine instantly re¬ 
turns relevant results drawn from such an index. No single human could ever per¬ 
form such a feat. The scale of digital culture demands intelligence that is qualita¬ 
tively similar to a human, but operates on a quantitatively different scale. 
AESTHETIC Al 

As I already pointed out, the original vision of Al was about automation of cogni¬ 
tion. Despite the difference in scale, super-cognition still follows this paradigm. So, 
when people talk about the great successes of Al in recent years, the examples 
used are the same tasks defined at the field’s beginnings many decades earlier: 
natural speech understanding, automated translation, and recognition of objects in 
photos. But what is perhaps less obvious is that Al now plays an equally important 
role in our cultural lives and behaviors, increasingly automating the processes of 
aesthetic creation and aesthetic choices. 
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Examples of automatic analysis of photographs by Al software developed by 
EyeEm. The computer rates aesthetics of photographs and predicts tags describing 
their content generated, https://www.eyeem.com/eyeem-vision, accessed Sep¬ 
tember 2018 







Consider, for example, these selected examples of Al adoption in just one single 
cultural field — digital photography. I divided them into two categories: assistance 
in selecting appropriate images from large (often massive) collections, and assis¬ 
tance in the creation/editing of new content. (Note that Al can assist the human 
selection process or be employed for completely automatic selection.) 

Selectingfrom existing content: 

1. Image sharing services and marketplaces use Al to predict the content of images 
and assign keywords to them (Crigonis, 2016). 

2. Instagram’s Explore screen recommends images and videos to each user based 
on a combination of many factors (not only based on what the user liked in the 
past). 

3. Yelp automatically selects the best photos to illustrate the numerous businesses 
listed. 

4. The Roll app from EyeEm automatically rates the aesthetic quality of user pho¬ 
tos, while EyeEm image marketplace assigns such scores to submitted photos. 

5. Huawei ran a photo contest where submitted photos were judged by Al: “Trained 
using 4,000,000 images taken by professional photographers and picture editors, 
the Al will then give each photo a personalized Al score based on parameters 
such as focus, jitter, deflection, color, and composition” (Hillen, 2018). 

Creation / Editing of new content 

1. Photo apps can automatically modify captured photos according to the norms 
of “good photography.” 

2. Other apps “beautify” selfies and portraits. Tencent’s (leading Chinese IT com¬ 
pany) long list of capabilities in this area demonstrates the range of automated ad¬ 
justments possible: “dermabrasion, skin whitening, eye enlargement, face thinning, 
removing acne, adding eyelids, changing skin color, recognizing face color and 
applying foundation, applying lip gloss, eyebrow shaping, and applying other 
makeup" (Tencent YoTu Lab, 2018). 


3. Photoshop g.i uses Al in its “subject select” function to automatically select ob¬ 
jects from a background. 

4. Phone cameras analyze the 3D layout of a scene and blur backgrounds in por¬ 
traits and selfies (Associated Press, 2016). 

5. The Huawei Mate 10 phone camera (released 10/2017) uses Al to analyze what 
it sees. It then classifies it into one of several scene types, and selects the appro¬ 
priate parameters for capturing a given scene — even before you decide to take a 
photo. 

Other applications of Al in photography are experimental at the time of writing this 
book, or are only emerging and have not yet been implemented in products. For 
example, EyeEm engineers described an experiment in which their system learned 
the styles of different photo curators from only 20 sample photos from each, and 
then selected similar images from EyeEm’s large collection, so the curators could 
make further selections (Shaji & Yildirim, 2017). Google designed a system that 
mimics the skills of a professional photographer, such as selecting photos suitable 
for editing, cropping, and applying filters. Progress in commercial implementation 
of new ways that Al can understand aspects of photographs can be tracked by pe¬ 
riodically visiting the website of Clarifai company, one of the leaders in this field 
(Clarifai, 2018). 

Outside of photography, cultural uses of Al include music recommendations in 
Spotify, iTunes, and other music services, apps that automatically edit a user’s raw 
video to create short films in a range of styles, and the creation of new fashion 
items and styles (Shah, 2017). As the adoption of Al in culture continues to grow, 
the concept of Al is also changing, and it is challenging to come up with a definite 
taxonomy to describe it. Building on the two categories I used to organize exam¬ 
ples of applications of Al in photography, here is one possible taxonomy of types 
of cultural Al I see today: 

1) Selecting content from larger collections: search, discovery, curation, 


recommendations and filtering. (Asrar, 2016). 

2) Targeting content (e.g., one-to-one marketing, behavioral targeting, and market 
segmentation). 

3) Assistance in creation/editing of new content. (If we are to think of Al as intel¬ 
ligent in the biological sense, we can cal this “participation” in content creation.) 

4) Fully autonomous creation (e.g., Al composing music tracks in a particular style, 
writing business and sports news articles (Kafka, 2016), creating visualizations 
from given datasets, designing websites, generating email responses, etc.) 

Al AND AESTHETIC DIVERSITY 

One important trend we can see in the examples above is a movement towards 
gradual automation (semi or full) of aesthetic decisions — recommendation en¬ 
gines suggesting what we should watch, listen to, read, write, or wear; devices and 
services that automatically adjust the aesthetics of captured media to fit certain cri¬ 
teria; software that rates the aesthetic quality of our photos, etc. This development 
raises big questions about the future of culture. Does such automation lead to a 
decrease in aesthetic diversity over time ? Is this inevitable, or are there other forces 
that may counteract this, increasing diversity? 

To illustrate what this means for image culture, the question may be rephrased, for 
example, as follow: Do automated enhancements and edits that mobile cameras 
and photo sharing services apply to user photos make them less diverse aesthet¬ 
ically? Will further Al integration in user photography devices and image sharing 
sites lead to standardization of “photo imagination?” Do search and recommen¬ 
dation engines or functions such as Instagram’s Explore tend to show the same 
images (or many variations of images with certain content, or perhaps only images 
with a certain “professional” aesthetic) to lots of people, thus diminishing the 
diversity of what we see? 

But Al, algorithms, and user interfaces of digital services, apps, and products may 
also be increasing aesthetic diversity. For example, digital cameras and photo apps 


have many functions for customization. On my camera (Fuji E-3), I can choose the 
shutter speed, aperture, ISO, desired levels of highlight and shadow tone, color 
density, sharpening, grain, dynamic range, noise reduction, and film simulation fil¬ 
ters (Fujifilm Corporation, 2017). Free photo editing apps such as Snapseed, which 
are used by many people to prepare their photos for Instagram, also offer a large 
number of editing tools comparable to professional desktop software such as 
Photoshop and Lightroom. Overtime, phone cameras and photo editing apps have 
been adding more and more controls, and a lot of them are now free. Therefore, 
while the gradual Al integration into phone cameras and sharing sites may con¬ 
tribute to a decrease in aesthetic diversity, the simultaneous addition of more and 
more controls to cameras and photo apps may have the opposite effect. 

Or consider recommendation engines. They can be programmed to recommend 
items that are already the most popular among other users, thus decreasing your 
chances of seeing more varied items. Alternatively, they can be programmed to ex¬ 
pose users to more diverse items including ones that they most likely would not 
find on their own. A person may use many engines for different media daily, and 
they may all be programmed differently, so it would be unwise to assume that all 
engines together push users in any one particular direction. And since any industry 
engine uses many different inputs to come up with its recommendations, the items 
it recommends to a user may both expose her to what is already popular, and to 
what she would not find on her own. In one 2010 quantitative study, researchers 
set out to “investigate the impact of YouTube’s recommendation system on view¬ 
ing diversity to understand whether the recommendation system helps users to 
discover videos of interest, but not necessarily popular [ones], or is more likely to 
recommend popular videos only.” Their conclusion was “that the current recom¬ 
mendation system helps to increase the diversity of video views in aggregation" 
(Zhou, Gao, & Khemmarat, 2010) — but this can be different in different times, as 
YouTube changes its algorithms. 


Of course, a number of other trends also influence aesthetic diversity in contem¬ 
porary culture besides computational technologies. The rise of the world wide web 
and social networks, growth of international travel, globalization of consumer 
economies and advertising, zero cost telecommunication, growth of foreign stu¬ 
dent enrollment, growth of remote work, and the rise of Japan, followed by Korea 
and then China, as exporters of cultural products and images are just some exam¬ 
ples among many other developments all playing a role. On the one hand, they are 
making the world into a single global village — or if you like, a single cultural mar¬ 
ketplace, where certain im-ages, ideas, values, narratives, products, and styles are 
marketed to everybody and available everywhere, and this may decrease diversity. 
On the other hand, the same trends may also be increasing diversity because local 
cultural DNAs become available globally. 

Given that many developments are influencing global aesthetic diversity, the role 
played by cultural Al is probably not the most significant yet — but it is likely to 
grow in the future for at least two reasons. First, billions of people who still don’t 
have access to internet and smartphones will get this access and start using the 
same Al-driven recommendation engines, automated aesthetic editing of captured 
media, selfie-beautifying apps, and so on. Second, the automation of aesthetic 
decisions we have seen so far is still at an early stage, with many more things to 
come. For example, right now people take photos themselves, cameras apply some 
aesthetic adjustments at the time of capture, and then a person may use editing 
software to do further adjustments. But it is easy to imagine a future scenario 
where cameras will themselves choose what and when to capture to give us the 
most satisfying photos that fit certain concepts and aesthetic ideals. In fact, the 
Google Clips video camera released in January 2018 is already doing this. The cam¬ 
era is fully Al-based. It uses computer vision to recognize people and pets and cer¬ 
tain emotional expressions, and was trained by professional photographers to 
make “good” videos with proper composition, interesting actions, etc. (Lovejoy, 


2018). 

MEASURING DIVERSITY 

So how can we know if aesthetic diversity in contemporary culture — or even only 
in one area such as photography — is growing or decreasing? Maybe we can use 
Al itself to start answering such questions more precisely, as opposed to guessing 
or just following our intuitions that are often wrong? 

Since the middle of the 2000s, hundreds of thousands of computer and social 
scientists have been quantitatively analysing massive samples of contemporary 
digital culture, including billions of posts and user interactions on Facebook, Twit¬ 
ter, Instagram, Flickr, Pinterest, and other social networks, as well as on recom¬ 
mendation sites such as Yelp, creatives networks such as Behance, etc. (I will de¬ 
scribe relevant examples of such research below). They developed many quanti¬ 
tative measures that describe — or attempt to — some aspects of culture such as 
structures of sharing in social networks or uniqueness and originality of user- 
created images. If we think within this paradigm, we can also propose measures of 
aesthetic diversity, and apply them to some cultural areas and types of media. Since 
we can often access user content shared online in the past (for example, content 
shared on Flickr since 2004, or Instagram since 2010), we can also calculate how 
diversity in some cultural areas is changing over time. 

This would require developing formal measures of aesthetic diversity for different 
media and cultural fields, from fashion and interior design to cinema and music. 
And this would be very useful by itself because it would allow us to look at contem¬ 
porary culture in new ways. Although such formal definitions will never fully ac¬ 
count for our aesthetic experiences in many cases, they can still help us by pro¬ 
viding new concepts for thinking about global digital culture. 

Note that we would need to differentiate between different types of diversity. One is 
diversity of content, i.e. the items being created in a particular cultural area. For 
example, in the case of photography, this diversity includes types of subjects, 


techniques, and styles in photographs. Another is diversity in users’ choices, similar 
to how it was analyzed in the YouTube study previously cited. For example, con¬ 
temporary fashion designers around the world may be creating items that can have 
very diverse styles, silhouettes, forms, volumes, materials, textures, and colors, but 
the diversity of items being purchased and worn by people worldwide may be 
much smaller. Or, it can be much bigger since today many people mix different 
items in their outfits, creating composite looks that designers and retailers do not 
offer. Other types of diversity can also be defined as appropriate. 

The idea of measuring aesthetic diversity in global contemporary culture allows us 
to make other interesting distinctions. One is between global and local diversity. If 
we measure a sufficient number of items globally, culture in many local places may 
look very homogeneous in comparison to the full range of choices available world¬ 
wide. But, if we zoom into such places, what looked like small hills will now look 
like mountains, so to speak — i.e. we will realize that these places are quite diverse 
if viewed on their own. 

Related to this distinction is another one — between objective and subjective 
measurements. (We can also call this analysts’ vs. users’ perspectives.) So far, we 
assumed that our measurements are taken from an abstract, all-seeing point of 
view. All items or user choices are placed on a single scale and viewed and mea¬ 
sured from the outside. Such perspective is standard in biology when we are mea¬ 
suring Earth’s biodiversity — for example, counting the number of distinct species 
in a given habitat, or on Earth as a whole. But in the case of culture, we may al¬ 
ternatively consider how diversity and differences between items inside a habitat is 
perceived by members of this habitat itself. In this perspective, local traditions and 
conventions determine whether some items or choices are perceived as radical or 
not — and not their objective characteristics alone. To continue with fashion exam¬ 
ples, in many Western cities people commonly wear many colors, and this is not 
seen as radical. But in a city like Seoul where the palettes of grays, whites, and 


blacks dominate how people dress, the appearance of outfits with very saturated 
colors, and especially more than one such color together, will be noticed and per¬ 
ceived as being outside the norm. 

LIMITS TO AUTOMATION 

Will Al replace professional cultural creators — media, industrial, and fashion de¬ 
signers, photographers and cinematographers, architects, urban planners, and so 
on? Will countries and cities worldwide compete as to who can more quickly and 
better automate their creative industries? Will countries and cities (or separate 
companies) that figure out how best to combine Al and human skills and talents 
get ahead of the others? 

Today Al gives us the option to automate our aesthetic choices (via recommen¬ 
dation engines), assists in certain areas of aesthetic production such as consumer 
photography, and automates other cultural experiences (for example, automatically 
selecting ads we see online). But in the future, it will play a larger part in profes¬ 
sional cultural production. Its use of helping to design fashion items, logos, 
music, TV commercials, and works in other areas of culture is already growing. But 
currently, human experts usually make the final decisions or do actual production 
based on ideas and media generated by Al. 

The well-known example of Came of Thrones (American fantasy television drama 
series that premiered in 2011) is a case in point. The computer suggested plot 
ideas, but the actual writing and the show’s development was done by humans. We 
can only talk about fully Al-driven culture where Al will be allowed to create the fin¬ 
ished design and media from beginning to end. In this future, humans will not be 
deciding if these products should be shown to audiences; they will just trust that 
Al systems know best — the way Al is already fully entrusted to choose when and 
where to show particular ads, as well as who should see them. 

We are not there yet. For example, in 2016 IBM Watson created the first “Al-made 
movie trailer” for the feature film Morgan (Mix, 2016). However, Al only chose 


various shots from the completed movie that it “thought” were suitable to include 
in the trailer, and a human editor did the final selection and editing. In another 
example, to create a system that would automatically suggest suitable answers to 
the emails users receive, Google workers first created a dataset of all such answers 
manually. Al chooses what answers to suggest in each possible case, but it does 
not generate them. (The head of Google’s Al in New York explained that even one 
bad mistake in such a scenario could generate bad press for the company, so 
Google could not risk having Al come up with suggested answer sentences and 
phrases on its own.) 

It is logical to think that any area of cultural production which either follows explicit 
rules or has systematic patterns can be in principle automated. Thus, many com¬ 
mercial cultural areas such as TV dramas, romance novels, professional photog¬ 
raphy, music video, news stories, website and graphic design, and residential 
architecture are suitable for automation. For example, we can teach computers to 
write TV drama scripts, do food photography, or compose news stories in many 
genres (so far, Al systems are only used to automatically compose sports and busi¬ 
ness stories). So rather than asking if any such area will be automated one day or 
not, we need to assume that it will happen and only ask “when." 

This sounds logical, but the reality is not so simple. Starting in the 1960s, artists, 
composers, and architects used algorithms to generate images, animations, music, 
and 3D designs (“Computer Art," n.d.). Some of these works have entered the cul¬ 
tural canons. They display wonderful aesthetic inventiveness and refinement. How¬ 
ever, in most cases they are abstract compositions with interesting and complex pat¬ 
terns, but without direct references to the human world. Think of such classics as ab¬ 
stract stract met- images by Manfred Mohr (1969-1973) (Mohr, n.d.), John Whit¬ 
ney’s computer animation Arabesque (1975), or Iannis Xenakis’s musical compo¬ 
sitions Atres and Morsima-Amorsima (1962) (Maurer, 1999). There is no figuration 
in these algorithmically-generated works, no characters like in novels, and no shots 


of the real world edited together into narratives like in feature films. 

Now compare these abstract algorithmic classics with current attempts to au¬ 
tomatically synthesize works that are about human beings, their worlds, their inter¬ 
ests, emotions and meanings. For example, today Google Photos and Facebook 
offer users automatically created slideshows and videos edited from their photos. 
The results are sometimes entertaining, and sometimes useful, but they can’t be 
yet compared to professionally-created media. The same applies to images gener¬ 
ated by Google engineers using DeepDream neural net (2015-) and later by others 
who used the same technology (DeepDream, n.d.). These Al creations in my view 
are more successful than the automatically-generated slideshows of user photos, 
but this is not because DeepDream is a better Al. The reason is that 20 century 
visual art styles tolerate more randomness and less precision than, for example, 
a photo narrative about a trip that has distinct conventions and restrictions on 
what and can be included and when. Thus, in the case of DeepDream, Al can create 
artistically plausible images which do refer to the human world because we con¬ 
sider it “modern art” and expect big variability. But in the case of automatically 
edited slideshows, we immediately know that the computer does not really under¬ 
stand what it is selecting and editing together. 






Al AND GENRE CONVENTIONS 


Creating aesthetically-satisfying and semantically-plausible media artifacts about 
human beings and their world may only become possible after sufficient progress 
in “artificial general intelligence" (AGI, also referred to as “strong AT) is made. In 
other words, a computer would need to have approximately the same knowledge of 
the world as an adult human. 

However, this is not only a matter of progress in Al research. Whether an algo¬ 
rithmic creation looks plausible or not also depends on genre conventions. In 




some cases, even very simple Al can produce satisfying results. 

In 2002-2005, I collaborated with Andreas Kratky on Soft Cinema — a semi- 
automated system for the making of procedural narrative films (Manovich & 
Kratky, 2005; Manovich, 2005). Using software we developed, we produced three 
such films. Each film used a database of a few hundred short video shots we 
recorded. The choice of shots for the films and their editing in time was done by 
software using parameters we selected. 

The project was shown as an installation in 45 exhibitions. During the duration of 
a singular exhibition, which sometimes would last a few months, the software was 
continually editing a film in real-time by selecting short video clips from a - 
a -data-base. The complete narrative for each film lasted between 7 and 13 min¬ 
utes, depending on the film. After one version played, a computer immediately 
started editing and showing the next version. 

Rather than trying to simulate conventions of mainstream narrative cinema or 
documentary filmmaking, we instead took as our inspiration experimental 20^ 
century films. Specifically, we used the principle that narration and visuals do not 
always have to correspond. 

As a voice-over narrated a story, a computer program was selecting short video 
clips from a database and editing them together using metadata and rules we 
established. The program was also generating screen layouts which used a Mon- 
drian-like grid to display anywhere from one to six videos in the same frame. The 
sequences of clips did not directly illustrate the narrative. However, since all clips 
in each film database shared certain semantic and visual references, the overall re¬ 
sult of this semi-automated process looked meaningful. The mind of a viewer cre¬ 
ated connections between the narrative content and visuals of the clips playing on 
the screen. Thus, the more “loose” and associative conventions of “avant-garde” or 
“experimental” cinema turned out be much easier to simulate than a conventional 
narrative film. 


The latter requires much tighter coordination between all shots. The viewer would 
immediately notice all mistakes Al made, while in our Soft Cinema mistakes were 
not possible in principle, since the selection and screen placement of clips did not 
directly illustrate the narrative. 

However, it is likely that we will also gradually learn to use Al to generate works in 
cultural genres which have a lot of rules and constraints. For example, synthesized 
3D human characters and conversational agents are gradually becoming more real¬ 
istic. This is a very challenging area because human verbal communication is cer¬ 
tainly a very strict "genre" — if a speaker makes a gesture or facial expression that 
does not correspond to what he or she is saying, we will notice this immediately. 
The use of algorithms in design and archi-tecture (often called “parametric de¬ 
sign”) is an-other mature area. Perhaps one day we can delegate to computers the 
creation of all details of an entire city, from planning to architecture, landscaping, 
traffic management, and all infrastructure. But when this happens, will we be able 
to prevent such an Al-metropolis from imposing on, regulating, and rationalizing 
our existence in the name of progress and human happiness — like the urban vi- 
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sions of the 17 1 - i9 tn centuries, European utopias of ideal human communities, 
Le Corbusier’s 1920s radical urban rationalization proposals, and cybernetic and 
mechanical visions in Godard’s Alphaville and Tati’s Playtime ? 

If all creative and knowledge work the domain of Al, what will be left for humans? 
What will be the purpose of our existence? Watching endless films created by Al, 
listening to Al-generated music, and being driven in driverless cars around Al- 
generated cities? 

Many modern thinkers and artists have envisioned a future where humans, liber¬ 
ated by machines from mechanical and boring work, will be engaging only in play 
and art (e.g., Constant’s New Babylon). But if automation of cultural production by 
Al continues, eventually it will be these Al playing and making art — not us. 

Al AS A CULTURE THEORIST 


Automation of cultural production may use A! based on a system of explicit rules, 
or it may use a different approach called “supervised machine learning.” Advances 
in "deep learning" (particular methods for supervised machine learning) in the 
2010s have made the latter approach very popular today. In one common cultural 
application of supervised learning, a computer is first fed many examples of works 
of particular genres, styles, and media, and it gradually learns the patterns com¬ 
mon to each genre or style. After that, a computer is given new works it has not 
seen before, and uses the patterns it learned to classify those works. For example, 
in the case of creating the movie trailer for Morgan, a computer was given 100 hor¬ 
ror movies. In these movies, "each scene was tagged with an emotion from a broad 
bank of 24 different emotions and labels from across 22,000 scene categories, 
such as eerie, frightening, and loving" (Smith, 2016). The computer learned “the 
types of scenes that categorically fit into the structure of a suspense/horror movie 
trailer.” So, after it was given Morgan, it selected the scenes that it believed to be 
the best ones for the trailer. 

We can say that Al, implemented in this way, acts as a cinema (or art, video game, 
fashion, etc.) theorist and historian. These researchers also study many works cre¬ 
ated in particular places and historical periods to find common patterns. Their find¬ 
ings become part of the history and theory of this area. 

However, this is a crucial difference between an “Al culture theorist" and a human 
theorist/historian. The latter comes up with explicit principles that describe how a 
cultural area functions. For example, the standard textbook used in universities in 
numerous film studies courses — The Classical Hollywood Cinema — provides an¬ 
swers to questions such as “How does the typical Hollywood film use techniques 
and storytelling forms of the film medium?” (Bordwell, Staiger, & Thompson, 
20 to) But at present, the frequent result of training A! ((i.e., use of deep neural net¬ 
works) on many cultural examples is a black box. Given new examples, it can clas¬ 
sify them correctly — for example, it can decide whether a particular film belongs 


to “classical Hollywood cinema” or not. But we often don’t know how a neural net¬ 
work came up with this decision. Similarly, a neural net can be trained to distin¬ 
guish between works of different artists, fashion designers, or film directors. And it 
can also generate new objects in the same style. But often we don’t know what ex¬ 
actly the computer has learned. (However, many computer scientists are working 
to develop methods to make black boxes created by neural networks more trans¬ 
parent and for it to be possible to “audit” them.) 

This is one of the key issues surrounding cultural uses of Al. Are the results of ma¬ 
chine learning interpretable, or are they only a black box which is efficient in pro¬ 
duction but useless for human understanding of the domain? Will the expanding 
use of machine learning to create new cultural objects make explicit the patterns in 
many existing cultural fields that we may not be aware of? And if it does, will it be 
in a form that will be understandable for people without degrees in computer sci¬ 
ence? Will the companies deploying machine learning to generate movies, ads, de¬ 
signs, images, music, urban designs, etc. expose what their systems have learned? 




II. 


Al and Analysis of Culture 

QUANTITATIVE ANALYSIS OF LARGE CULTURAL DATA 

Most of my examples of practical cultural uses of Al listed above involve commer¬ 
cial devices, services, or research done in companies. But they are not the only 
players in cultural Al. Many academic researchers have been using various com¬ 
puter methods to quantitatively study culture using large datasets of cultural ob¬ 
jects, or records of users’ interactions and behaviors. This research is published in 
academic journals or presented at conferences, so it is accessible, and code and 
datasets used in studies are also made available. (Note that computer scientists 
working in companies also periodically publish papers describing in detail the 
techniques used in these companies, such as for example the description of Pinter- 
est’s visu-al recommendation engine [JEng, et al., 2015]). 

Much of such research in computer science uses large samples of user content 
shared on social networks and data about people’s behaviors on these networks, 
such as numbers of views, likes, and shares. Researchers have published hundreds 
of thousands of papers that computationally analyze the characteristics of image, 
video, and text posts, as well as user behavior on most popular social networks 
and media sharing services such as Weibo, Facebook, Instagram, Flickr, YouTube, 
Pinterest, and Tumblr. In one research area called “computational aesthetics," 
scientists create mathematical models that predict which images and videos will be 
popular, and how this popularity is affected by their content and other charac¬ 
teristics such as "memorability," "interestingness” “beauty,” or “creativity” (Redi, 
O’Hare, Schifanella, Trevisiol, & Jaimes, 2014; Schifanella, Redi, & Aiello, 2015). 
(The researchers also proposed metrics to measure these characteristics). 

Let’s look at computer science publications that quantitatively analyze Instagram as 
an example. To locate only quantitative papers, I usually add the word "dataset" to 
the name of the social network I want to find research about, and then search for 


this combination of words on Google Scholar. The search for “Instagram dataset” 
conducted on July 15, 2017 returned 9,210 journal articles and conference papers. 
Here are some of these papers from 2014-2017 that will give you an idea about top¬ 
ics pursued by researchers. 

One publication analyzed the most popular Instagram subjects and types of users 
in terms what kinds of subjects frequently appear in photos in their feeds (Hu, 
Manikonda, & Kambhampati, 2014). Another paper used a sample of 4.1 million 
Instagram photos to quantify the effect of using filters on numbers of views and 
comments (Bakhshi, Shamma, Kennedy, & Gilbert, 2015). In another paper, a 
group of researchers analyzed temporal and demographic trends in Instagram 
selfies by using 5.5 million photos with faces they collected from the network. They 
have also tested three alternative hypotheses about the reasons behind posting 
selfies in each of 117 countries in their dataset (Souza, et al., 2015). Yet another 
paper investigated clothing and fashion styles in 44 world cities using too million 
Instagram photos (Matzen, Bala, &Snavely, 2017). 

Other publications use computational methods to analyze patterns in historical 
culture using the available digitized archives. Interesting exam-pies of such re¬ 
search are Toward Automated Discovery of Artistic Influence (Saleh, Abe, Arora, & 
Elgammal, 2014), Measuring the Evolution of Contemporary Western Popular Music 
(Serra, Corral, Boguna, & Haro, 2012), Shot durations, shot classes, and the increased 
pace of popular movies (Cutting & Candan, 2015), and OmniArt: Multi-task Deep 
Learning for Artistic Data Analysis (Strezoski & Worring, 2017). 

The first paper presents a mathematical model for the automated discovery of 
influence between artists. The model was tested using 1,710 images of paintings by 
66 well-known artists. While some of the discovered influences are the same ones 
often described by art historians, the model also suggests other visual influences 
between artists that had not been discussed previously. The second paper inves¬ 
tigates changes in popular music using a dataset of 464,411 songs produced 


between 1955 and 2010. The dataset included “a variety of popular genres, includ¬ 
ing rock, pop, hip hop, metal, or electronic." The authors concluded that over 
time, there was “the restriction of pitch transitions” and “the homogenization of 
the timbral palette" — in other words, some of the musical variability has de¬ 
creased. The third paper analyzes gradual changes in average shot duration across 
9,400 English-language narrative films created between 1912 and 2013. The fourth 
paper uses machine learning to predict artist name, period, and material of art¬ 
works from a dataset of 432,217 historical art images. 

THE LIMITS OF STATISTICAL REASON 

These and many other studies that analyze large cultural samples and collections 
using Al and statistical methods present valuable and original insights. They 
would be impossible to arrive at without such datasets and methods, using only 
“armchair" academic theorizing or small group ethnographic observations. But be¬ 
cause most of these studies use large datasets and a statistical approach, they also 
have some common limitations. 

The discipline of statistics has a few areas, and one of them (historically the ear¬ 
liest) is descriptive statistics. Here “statistics” refers to a summary of a collection of 
information. For example, we can measure the height of a large number of people 
in a particular demographic group in a given city or a country and calculate their 
average height. If we measure one million people, we can then replace one million 
different numbers with a single number representing this average. 

Any summary of the information by default is going to omit some details because 
it is smaller than the original information. Therefore, if we use a statistical ap¬ 
proach to summarize a collection of cultural artifacts, or information about cultural 
behaviors, our findings will not apply to every item in the collection or a record of 
behavior. Similarly, mathematical models that describe patterns and relationships 
in cultural data and attempt to predict future data will only be correct some of the 
time. Such summaries and models describe only some patterns in a cultural 


dataset, omitting many others. 

For example, the authors of the paper about the effects of Instagram image filters 
analyzed data on millions of photos and reported that “filtered photos are 21% 
more likely to be viewed and 45% more likely to be commented [on]" (Bakhshi, 
Shamma, Kennedy, & Gilbert, 2015). But certainly, this large sample also contains 
many filtered photos that were not viewed more frequently or did not receive more 
comments than unfiltered photos. This is the nature of statistical summaries and 
models — they describe general trends but usually can’t predict every single data 
point. For example, when measuring the height of a population, some people will 
be significantly taller or shorter than the average. Similarly, in the case of filtered 
Instagram photos, some filtered images are likely to receive significantly more 
views than the 21% average; others will receive the same number of views as unfil¬ 
tered photos, and some will receive views. 

Of course, most qualitative cultural theories and histories are also summaries of 
information. They don’t describe every single artifact within a given genre, medi¬ 
um, or period, but present only certain trends. Specifically, they may propose that 
certain artistic techniques, themes, and conventions are common to a particular 
set of feature films, TV drama series, literary narratives, or works in other media, 
genres, and periods. Both the writers and the readers understand that these tech¬ 
niques, themes, and conventions are present in some of these artifacts, but not in 
all of them. 

So, in this way, qualitative and quantitative studies of culture are similar — they ac¬ 
count for only some of the information present in a cultural dataset. The artifacts 
that are not covered may contain other techniques, themes, structures, or conven¬ 
tions that qualitative theory or quantitative models do not describe. And even the 
artifacts that are covered are likely to have other patterns a particular theory or 
model does not describe. 

The same goes for the accounts of user experience. Qualitative theorists often talk 


about the effects of certain artifacts on the readers, viewers, or listeners by extrap¬ 
olating from their own singular experience, and this is a very big generalization. 
Quantitative studies of social media instead rely on the explicit signs and traces of 
the experience of a large number of users such as likes and shares on social 
media. While larger samples allow them to draw more accurate conclusions, such 
signs and traces can’t cover the full range of aesthetic experiences,and this is a very 
big limitation of this approach. 

Here it is relevant to recall the famous essay S/Z by French literary critic and semi- 
otician Roland Barthes which analyzes Balzac’s short story (Barthes, 2002 [1973]). 
The essay does not claim to describe all semiotic codes in Balzac’s short story. It 
also does not claim that they form a system, contrary to an earlier structuralist 
semiotics view. Instead, Barthes in his analysis only scanned Balzac’s story, show¬ 
ing examples of many kinds of codes, without assuming that a comprehensive de¬ 
scription of all of them even in such a short story is possible or desirable. This ex¬ 
plicit recognition of limits of any cultural reading remains very relevant today, as 
the use of computational and statistical methods for analysis of cultural data is 
becoming more popular. 

AGAINST REDUCTION 

For scientists, having a statistical model which can predict only some future data is 
not ideal, because science assumes that such a model should accurately capture 
characteristics of the phenomena being modeled. A model that is accurate 
90% of the time is better than one with only 60% accuracy. Having a model that 
predicts only some of the data means that we don’t fully understand the process 
that generates it, or that we did not capture all the relevant characteristics. (For 
example, in object and scene recognition research, computer vision scientists 
compete every year to see which model can most accurately recognize objects and 
types of scenes in a large number of photos [Russakovsky, et al., 2015]). 

But we can also use Al and other computational methods to study culture without 


such assumptions, and this is part of the alternative research paradigm I refer to as 
Cultural Analytics (2007-) (Manovich, 2009). In this paradigm, we do not want to 
“explain” most or even some of the cultural data using simple mathematical mod¬ 
els and treat the rest as “error” or “noise" because our models cannot account for 
it. We also do not want to assume that certain patterns such as “hero’s journey," 
“golden ratio," or “binary oppositions" exist in a lot of cultural artifacts. In other 
words, the ultimate goal of Cultural Analytics is to avoid reductive summarization 
typical of both traditional qualitative cultural theory and history and recent quanti¬ 
tative computational research. 

To achieve this, we have been creating very high-resolution visualizations that 
show all artifacts in a visual dataset together, rather than quantifying only some 
dimensions of these artifacts and then building statistical models. In the visual¬ 
izations, the artifacts are sorted in different ways according to properties such as 
color, texture, corn-position, or content. Our software measures such properties 
using techniques from computer vision, and then organizes all artifacts according 
to these measurements. Therefore, some reduction is also inevitable — for exam¬ 
ple, if we organize 50,000 images shared on Instagram in Tokyo or Bangkok ac¬ 
cording to the time of day and average color saturation, the patterns in time and 
saturation will stand out, and other patterns will be harder to see. However, rather 
than producing a single visualization of each dataset, we create multiple visual¬ 
izations where images are sorted according to many different properties, so each 
can reveal a different pattern. 








Top: 50,000 photos shared on Instagram in Bangkok. Bottom: 50,000 photos 
shared on Instagram in Tokyo. Photos were collected in early 2012. In each visual¬ 
ization, photos are sorted according to the day and time they were shared (top to 
bottom, left to right). See http://phototrails.info. Nadav Hochman, Lev Manovich, 
Jay Chow, 2013. 

“To avoid reductive summarization” — does this mean that we are only interested 
in differences between artifacts and that we want to avoid any kind of reduction at 
all cost? To postulate existence of cultural patterns is to accept at least some 
reduction in analyzing data. Without this, we cannot compare anything, unless we 
are dealing with extreme minimalism or seriality, where the artist makes everything 
else equal and only varies values on one or two dimensions, like in 1960s sculp¬ 
tures by Sol LeWitt. 




Therefore, some reduction is actually desirable. And to claim that we see any pat¬ 
tern at all is to practice such reduction. We just need to be explicit about what our 
patterns reveal and what is left invisible. Therefore, if we are to define Cultural 
Analytics as analysis and visualization of cultural patterns on different scales , we will 
need to immediately qualify this statement. While we want to discover repeating 
patterns in cultural artifacts and behaviors, we should always remember that they 
only account for some aspects of these artifacts and behaviors. 

Unless it is a 100% human copy of another cultural artifact or produced mechan¬ 
ically or algorithmically to be identical with other copies, every expression and 
interaction is unique, at least in some ways. In some cases, this uniqueness is not 
important for our analysis, and in other cases it is. For example, in our project 
Selfiecity , we extracted automatically many characteristics from thousands of Insta- 
gram self-portraits shared during the same week in six global cities (Manovich, et 
al., 2014-2015). Our interactive software allows you to sort these selfies according 
to parameters such as angle and tilt of a head, degree of smile, age, gender, and 
others. For instance, you can select all female selfies from Sao Paolo that tilt their 
head more than 10 degrees, or all selfies from Bangkok with strong smiles 
(strength of a smile was measured on a 0-100 scale). In this way, you can discover 
many trends in how people pose for their selfies and compare them between cities. 



Screenshot from Selfiexploratory, an interactive application for exploring patterns 
in selfies developed as part of Selfiecity project. In this view, selfies from Sao Paolo 
with extreme right tilt angles are selected. See http://selfiecity.net for more details. 
Lev Manovich, Moritz Stefaner, Mehrdad Yazdani, Dominicus Baur, Daniel Goode- 
meyre, Alise Tifentale, Nadav Hochman, Jay Chow, 2013 


In our everyday perceptual experiences, we also constantly recognize common pat¬ 
terns. This is what allows us to make judgements that something is typical, un¬ 
usual, or unique. Such recognition is also a form of reduction — we feel that 
something either fits into familiar categories we experienced previously or that it 
does not fit into any of them. But this is not all we do. I think that the reason we do 
not get tired looking at endless faces and bodies when we browse Instagram is that 
each of them is unique, and we notice this. What fascinates as is not repeating pat¬ 
terns but unique details and their combinations. No two human faces are exactly the 
same, and we enjoy noticing these differences. 

The ultimate (perhaps utopian) vision of Cultural Analytics is to map in detail and 
understand the full diversity of contemporary professional and user-generated cul¬ 
tural artifacts created globally — i.e. to focus on what is different between 









numerous artifacts and not only on what they share (i.e. common patterns). In the 
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ig xn and 20 in centuries, the lack of appropriate technologies to store, organize, 
and compare large cultural datasets was contributing to the popularity of reductive 
cultural theories — for example, seeing every human culture as going through the 
same three stages (early development, reaching perfection, followed by decay). 
Such categorical schemes reduce the variability and messiness of actual cultural 
histories. Technological limitations in data management may have also contributed 

iL 

to the ig in century obsession with cultural classification — another way to reduce 
the endless diversity of human cultural expressions to a manageable system. 

SEEING DIFFERENCES 

Today we can use a single computer to capture, compare, quantify, and visualize 
thousands of differences between tens of millions of objects. We do not have any 
more excuses to only focus on what cultural artifacts or behaviors share (as mem¬ 
bers of the proposed cultural category or a cultural period) — which is what we do 
when we categorize them or perceive them as instances of general types. Although 
we may have to start with extracting patterns just to draw our initial maps of con¬ 
temporary cultural production and dynamics given their scales and dynamics, 
eventually they may recede the background or even completely dissolve, as we 
focus only on the differences between individual objects. This is to me the ultimate 
promise behind using Al methods and cultural datasets to observe and describe 
cultures in the 21st century. 

We established our Cultural Analytics Lab in 2007 and spent the next 10 years 
assembling, analyzing, and visualizing dozens of datasets of artifacts in many gen¬ 
res of visual media, from pages of manga books, modernist paintings, and avant- 
garde films to images shared on Instagram, Twitter, and Flickr. As I learned, the 
Cultural Analytics ideals presented above are easy to state but challenging to exe¬ 
cute well in practice. Human brains and natural languages are categorizing ma¬ 
chines. Our perception constantly processes sensory information and categorizes 


it. A pattern we observe is like constructing a category: a recognition that some 
things or some aspects of these things have something in common. So, can we 
learn to think about cultural processes without categories? How can we hope to even¬ 
tually dissolve the patterns that Al-based analysis will reveal in numerous artifacts 
to see only their differences? How can we refuse the very idea of summarization in¬ 
grained in us by a hundred years of statistics? 

How do we move away from the assumption of the humanities (which until now 
“owned" thinking and writing about culture) that their goal is discovery and inter¬ 
pretation of general cultural types, be they “modernism," “narrative structures,” 
“images of working class,” “selfies,” or “amateur digital photographers?” How do 
we instead learn to see cultures in more detail, without immediately looking for, 
and noticing, only types, structures or patterns? 

First, we need sufficiently large cultural data samples. Next, we can extract suffi¬ 
ciently large numbers of features that capture characteristics of the artifacts, their 
reception and use by audiences, and their circulation. (We also need to think more 
about how to represent cultural processes, interactions, and dynamics — espe¬ 
cially since today we use interactive digital cultural media as opposed to historical 
static artifacts. Once we have such datasets, we can explore them using a variety of 
unsupervised machine learning methods such as visualization of multi-dimensional 
data, clustering, and others. 

Supervised machine learning is commonly used in the culture industry to classify 
artifacts, people, and behaviors. Such classifications often reinforce already exist¬ 
ing and taken-for-granted classifications and ways of seeing the world. For exam¬ 
ple, some photo sharing sites and apps automatically detect and tag familiar class¬ 
es of objects and scenes in user photos (“cities,” “faces,” “food,” “landscapes,” 
“sea," etc.). 

Unsupervised machine learning methods allow us to discover new categories for 
which we don’t have names and to see connections we were not previously aware 


of. Thus, rather than reducing cultural data to familiar categories, unsupervised ma¬ 
chine learning can expose limitations of such categories and suggest new ways of 
seeing culture. 

In this short book I have outlined some of the issues raised by the ongoing inte¬ 
gration of Al into many parts of culture, as well as the use of Al to analyze it quanti¬ 
tatively. I have presented examples ofAl use in digital devices and services and one 
possible taxonomy of these uses, as well as examples of computational analysis of 
digital culture. The key theme that guided this discussion was the question of cul¬ 
tural variability. Does Al integration in cultural production and reception lead to 
a decrease in aesthetic variability? Or does it, on the contrary, increase it? And 
what are the different ways in which aesthetic variability can be defined and mea¬ 
sured? I see the last question productive in itself, since it may open new ways of 
thinking about aesthetics in the digital era. 



Visualization showing 1,074,790 unique pages from 883 manga series from Japan, 
Korea, and China. The pages are sorted according to their visual characteristics 
measured by computer vision software. The pages in the bottom part of the visual¬ 
ization are the most graphic and have the least amount of detail. The pages in the 
upper right have lots of detail and texture. The pages with the highest contrast are 
on the right, while pages with the least contrast are on the left. The visualization 
demonstrates the use of unsupervised machine learning for exploring large cultural 
data sets. Lev Manovich / Cultural Analytics Lab, 2010. 








Our cultural period is characterized by an unprecedented scale of production and 
circulation, and also by global integration in cultural production, reception, and 
reuse. Today people around the world create, share, and interact with billions of 
new digital artifacts every day. We need new methods for seeing culture at its new 
scale, velocity, and connectivity that can combine both qualitative and quantitative 
approaches and that can reveal full variability of this new ecosystem without reduc¬ 
ing it to a small number of categories. Al plays a crucial role in this new global cul¬ 
tural ecosystem, suggesting to people whom to follow and what to see, helping 
them edit media they create, making aesthetic decisions for them, determining how 
many people will see their content, deciding which ads will be shown to them, etc. 
Understanding the basic principles of Al techniques used today in culture is there¬ 
fore important if you want to be culturally literate today, and essential if you are 
a creator yourself. 
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